Search CORE

161 research outputs found

Reducing Branch Misprediction Penalty through Confidence Estimation

Author: Aragón Juan Luis
Publication venue
Publication date: 01/04/2002
Field of study

The goal of this Thesis is reducing the global penalty associated to branch mispredictions, in terms of both performance degradation and energy consumption, through the use of confidence estimation. The reduction of this global penalty has been achieved, firstly, by increasing the accuracy of branch predictors, next, by reducing the time necessary to restore the processor from a mispredicted branch, and finally, by reducing the energy consumption due to the execution of incorrect instructions. All these proposals rely on the use of confidence estimation, a mechanism that assesses the quality of branch predictions by means of estimating the probability of a dynamic branch prediction to be correct or incorrect.Resumen de tesis presentada por el autor en la Universidad de Murcia (2003)Facultad de Informátic

Control speculation for energy-efficient next-generation superscalar processors

Author: Aragón Juan Luis
González Colás Antonio María
González González José
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Conventional front-end designs attempt to maximize the number of "in-flight" instructions in the pipeline. However, branch mispredictions cause the processor to fetch useless instructions that are eventually squashed, increasing front-end energy and issue queue utilization and, thus, wasting around 30 percent of the power dissipated by a processor. Furthermore, processor design trends lead to increasing clock frequencies by lengthening the pipeline, which puts more pressure on the branch prediction engine since branches take longer to be resolved. As next-generation high-performance processors become deeply pipelined, the amount of wasted energy due to misspeculated instructions will go up. The aim of this work is to reduce the energy consumption of misspeculated instructions. We propose selective throttling, which triggers different power-aware techniques (fetch throttling, decode throttling, or disabling the selection logic) depending on the branch prediction confidence level. Results show that combining fetch-bandwidth reduction along with select-logic disabling provides the best performance in terms of overall energy reduction and energy-delay product improvement (14 percent and 10 percent, respectively, for a processor with a 22-stage pipeline and 16 percent and 13 percent, respectively, for a processor with a 42-stage pipeline).Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Reducing Branch Misprediction Penalty through Confidence Estimation

Author: Aragón Juan Luis
Publication venue
Publication date: 02/09/2011
Field of study

Servicio de Difusión de la Creación Intelectual

Trajectoires industrielles et gouvernance local dans des villes intermédiaires de la périphérie madrilène : les cas de Getafe et Alcalá de Henares

Author: Abad Aragón Luis
Michelini Juan José
Tébar Arjona Jesús
Publication venue: Ediciones Complutense
Publication date
Field of study

In recent years the interest for the role of intermediate cities in territorial development has become a central issue in urban development studies. More specifically, the construction of metropolitan areas based on polycentric urban structures raises the role these cities play in the construction of metropolitan outskirts and their ability to overcome the challenges posed by the particular conditions of these environments. In this paper, we present the cases of Getafe and Alcalá de Henares, both characterized by outstanding developmental trajectories in the Madrid metropolitan context for its success in building a model of development characterized by the importance of the industrial sector. It is argued that this has been possible thanks to the enterprises’ innovative capacity, and the characteristics of their respective institutional contexts. The research was based on 68 interviews conducted in both cities as well as quantitative sources commonly used in studies of industrial activity and urban development.En años recientes el interés por el papel de las ciudades intermedias en el desarrollo territorial ha devenido un tema central en los estudios sobre desarrollo urbano. Más concretamente, la construcción de regiones metropolitanas a partir de estructuras urbanas policéntricas plantea el papel que estas ciudades juegan en la construcción de las periferias metropolitanas, así como su capacidad para sortearlos desafíos impuestos por las particulares condiciones de esos entornos. En el presente trabajo, se presentan los casos de Getafe y Alcalá de Henares cuyas trayectorias de desarrollo destacan en el contexto metropolitano madrileño por haber logrado construir un modelo de desarrollo caracterizado por la importancia del sector industrial. Se argumenta que ello ha sido posible gracias a la capacidad innovadora empresarial, así como a las características de los respectivos contextos institucionales. La investigación se apoya en 68 entrevistas realizadas en ambas ciudades así como en las fuentes cuantitativas habitualmente utilizadas en los estudios de la actividad industrial y el desarrollo urbano.Durant des années récentes l’intérêt pour le rôle des villes intermédiaires dans le développement territorial est devenu un sujet central dans les études sur le développement urbain. Plus concrètement, la construction de régions métropolitaines à partir de structures urbaines polycéntriques pose le rôle que ces villes jouent dans la construction des périphéries métropolitaines, ainsi que leur capacité pour éviter les défis imposés par les conditions particulières de ces environnements. Dans le présent travail, on présente les cas de Getafe et Alcalá de Henares dont les trajectoires de développement soulignent dans le contexte métropolitain madrilène pour avoir obtenu construire un modèle de développement caractérisé par l’importance du secteur industriel. On fait valoir que cela a été possible grâce à la capacité innovatrice patronale, ainsi qu’aux caractéristiques des contextes institutionnels respectifs. La recherche s’appuie sur 68 interviews réalisées dans les deux villes, ainsi que dans les sources quantitatives habituellement utilisées dans les études de l’activité industrielle et le développement urbain

Portal de Revistas Científicas Complutenses

Optimización y mejora en Ampliación de Estructura de Computadores

Author: Aragón Juan Luis
Bernabé Gregorio
Fernández Lorenzo
Publication venue: Asociación de Enseñantes Universitarios de la Informática (AENUI)
Publication date: 01/01/2019
Field of study

En este trabajo se describen varios cambios relacionados con los contenidos teóricos y prácticos, la evaluación y la mejora de varias actividades de innovación que se han llevado a cabo en la asignatura de “Ampliación de Estructura de Computadores” durante el curso 2017/18. Dicha asignatura se imparte en el segundo curso del Grado de Ingeniería Informática de la Universidad de Murcia y se compone de seis créditos ECTS (3 créditos para teoría y prácticas) en los que el alumno dispone de 150 horas para asistir a las clases teóricas y a las sesiones de prácticas, realizar su trabajo autónomo y llevar a cabo la evaluación correspondiente. El estudio y análisis pormenorizado de los contenidos que se estudian en la asignatura para realizar una mejor planificación y adaptación al tiempo disponible desde el punto de vista teórico y práctico, el establecimiento de una evaluación continua y diversa, así como la optimización de la utilización del Aula Virtual de la Universidad de Murcia han contribuido a mejorar los resultados académicos de los alumnos en el curso 2017/18.This paper describes several changes related to the theoretical and practical contents, the course evaluation, and the improvement of several innovation activities that have been carried out for the subject “Advanced Computer Structure” during the 2017/18 academic year. This course is taught in the second year of the Computer Engineering Degree at the University of Murcia and consists of six ECTS credits in which the students have 150 hours (3 credits for theory and practices) to attend lectures and lab classes, perform their personal work, and carry out the corresponding evaluation. Overall, the detailed analysis of the contents that are studied in the subject to carry out a better planning and adaptation to the time available from the theoretical and practical point of view, along with the establishment of a continuous and diverse evaluation, in addition to a more optimized use of the Virtual Classroom (based on Sakai) of the University of Murcia have all contributed to improve the academic results of students in the 2017/18 academic year

Repositorio Institucional de la Universidad de Alicante

TCOR: a tile cache with optimal replacement

Author: Aragón Juan Luis
González Colás Antonio María
Joseph Diya
Parcerisa Bundó Joan Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Cache Replacement Policies are known to have an important impact on hit rates. The OPT replacement policy [27] has been formally proven as optimal for minimizing misses. Due to its need to look far ahead for future memory accesses, it is often reduced to a yardstick for measuring the efficacy of other practical caches. In this paper, we bring the OPT to life, in architectures for mobile GPUs, for which energy efficiency is of great consequence. We also mold other factors in the memory hierarchy to enhance its impact. The end results are a 13.8% decrease in the memory hierarchy energy consumption and an increased throughput in the Tiling Engine. We also observe a 5.5% decrease in the total GPU energy and a 3.7% increase in frames per second (FPS).This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, the ICREA Academia program and the AGAUR grant 2020-FISDU-00287. We would also like to thank the anonymous reviewers for their valuable comments.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

DTM-NUCA: dynamic texture mapping-NUCA for energy-efficient graphics rendering

Author: Aragón Juan Luis
Corbalán Navarro David
González Colás Antonio María
Parcerisa Bundó Joan Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Modern mobile GPUs integrate an increasing number of shader cores to speedup the execution of graphics workloads. Each core integrates a private Texture Cache to apply texturing effects on objects, which is backed-up by a shared L2 cache. However, as in any other memory hierarchy, such organization produces data replication in the upper levels (i.e., the private Texture Caches) to allow for faster accesses at the expense of reducing their overall effective capacity. E.g., in a mobile GPU with four shader cores, about 84.6% of the requested texture blocks are replicated in at least one of the other private Texture Caches. This paper proposes a novel dynamically-mapped NonUniform Cache Architecture (NUCA) organization for the private Texture Caches of a mobile GPU aimed at increasing their effective overall capacity and decreasing the overall access latency by attacking data replication. A block missing in a local Texture Cache may be serviced by a remote one at a cost smaller than a round trip to the shared L2. The proposed Dynamic Texture Mapping-NUCA (DTM-NUCA) features a lightweight mapping table, called Affinity Table, that is independent of the L2 cache size, unlike a traditional NUCA organization. The best owner for a given set of blocks is dynamically determined and stored in the Affinity Table to maximize local accesses. The mechanism also allows for a certain amount of replication to favor local accesses where appropriate, without hurting performance due to the small capacity loss resulting from the allowed replication. DTM-NUCA is presented in two flavors. One with a centralized Affinity Table, and another with a distributed Affinity Table. Experimental results show first that the L2 pressure is effectively reduced, eliminating 41.8% of the L2 accesses on average. As for the average latency, DTM-NUCA performs a very effective job at maximizing local over remote accesses, achieving 73.8% of local accesses on average. As a consequence, our novel DTM-NUCA organization obtains an average speedup of 16.9% and overall 7.6% energy savings over a conventional organization.This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, the ICREA Academia program and a research fellowship from the University of Murcia’s “Plan Propio de Investigacion”.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

MEGsim: A Novel methodology for efficient simulation of graphics workloads in GPUs

Author: Aragón Alcaraz Juan Luis
Corbalán Navarro David
González Colás Antonio María
Ortiz Escribano Jorge
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

An important drawback of cycle-accurate microarchitectural simulators is that they are several orders of magnitude slower than the system they model. This becomes an important issue when simulations have to be repeated multiple times sweeping over the desired design space. In the specific context of graphics workloads, performing cycle-accurate simulations are even more demanding due to the high number of triangles that have to be shaded, lighted and textured to compose a single frame. As a result, simulating a few minutes of a video game sequence is extremely time-consuming.In this paper, we make the observation that collecting information about the vertices and primitives that are processed, along with the times that shader programs are invoked, allows us to characterize the activity performed on a given frame. Based on that, we propose a novel methodology for the efficient simulation of graphics workloads called MEGsim, an approach that is capable of accurately characterizing entire video sequences by using a small subset of selected frames which substantially drops the simulation time. For a set of popular Android games, we show that MEGsim achieves an average simulation speedup of 126×, achieving remarkably accurate results for the estimated final statistics, e.g., with average relative errors of just 0.84% for the total number of cycles, 0.99% for the number of DRAM accesses, 1.2% for the number of L2 cache accesses, and 0.86% for the number of L1 (tile cache) accesses.This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, and the ICREA Academia program.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Fast and accurate SER estimation for large combinational blocks in early stages of the design

Author: Anglada Sánchez Martí
Aragón Alcaraz Juan Luis
Canal Corretger Ramon
González Colás Antonio María
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2021
Field of study

Soft Error Rate (SER) estimation is an important challenge for integrated circuits because of the increased vulnerability brought by technology scaling. This paper presents a methodology to estimate in early stages of the design the susceptibility of combinational circuits to particle strikes. In the core of the framework lies MASkIt , a novel approach that combines signal probabilities with technology characterization to swiftly compute the logical, electrical, and timing masking effects of the circuit under study taking into account all input combinations and pulse widths at once. Signal probabilities are estimated applying a new hybrid approach that integrates heuristics along with selective simulation of reconvergent subnetworks. The experimental results validate our proposed technique, showing a speedup of two orders of magnitude in comparison with traditional fault injection estimation with an average estimation error of 5 percent. Finally, we analyze the vulnerability of the Decoder, Scheduler, ALU, and FPU of an out-of-order, superscalar processor design.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness and Feder Funds under grant TIN2013-44375-R, by the Generalitat de Catalunya under grant FI-DGR 2016, and by the FP7 program of the EU under contract FP7-611404 (CLERECO).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC